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(54) Context switching technique for processors with large register files 



(57) A computer system and a method for operating 
a processor including the steps of establishing a first 
register save area (209a) and a second register save 
area (209b) in a memory (107), where each register 
save area holds data values that define a context, are 
described. The first context is loaded in the processor 
(102) by loading at least some of the data values from 
the first register save area into the plurality of registers 
(203). A first pointer value to the first register save area 



is stored in a current register file save area (RFSA) reg- 
ister (204). A context switch is indicated by storing a sec- 
ond pointer to the second register save area in the cur- 
rent RFSA register. The first pointer is transferred from 
the current RFSA register to a previous RFSA register. 
All of the data values that define the first context are 
transferred from the registers to a shadow register file 
(207). The second context is established in the proces- 
sor by loading selected data values from the second reg- 
ister file save area into the plurality of registers. 
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D script ion 

1. Field of th Invention. 

[0001] The present invention relates, in general, to 
data processors, and, more particularly, to a method, 
apparatus and system for implementing context switch- 
ing in processors with large register files. 

2. Relevant Background. 

[00021 Computer systems include central processing 
units (CPUs), microcontroller units (MCUs), and the like 
coupled with memory. Programs that run on such com- 
puter systems operate on data that may be stored and 
retrieved by the program or supplied at run-time. Pro- 
grams include a plurality of saved instructions that de- 
fine particular operations that are to be performed on 
the data. 

[0003] Most processor architectures generally define 
a plurality of registers for holding the data to be operated 
on by the program instructions. These registers may be 
implemented as hardware registers or as register files 
in general purpose memory. The registers store both the 
instruction and data that arc being or may be used by 
the processor. The registers are usually implemented in 
memory devices that are closely coupled with the proc- 
essor to provide low-latency access to data required by 
the processor. The registers are typically defined in the 
processor architecture specification and so are usually 
considered part of the processor architecture even 
where they are physically implemented in another de- 
vice. 

[0004] High speed processors may have tens or hun- 
dreds of registers in a general register file. The large 
number of registers can enable the processor to process 
a large amount of data concurrently and to load or store 
data from longer latency storage before it is needed. 
Very long instruction word (VLIW) processors tend to 
have higher register number requirements because of 
the inherent parallelism of VLIW that results in more 
concurrent operations. The higher register number re- 
quirements place correspondingly higher bandwidth 
and response time requirements on the memory bus 
that transfers data between memory and the registers. 
It can take multiple memory bus clock cycles to transfer 
data into or remove data from all the available registers 
in a processor. 

[0005] As a more specific example, multi-tasking and 
multi-threading processor architectures enhance data 
processing efficiency in many applications. In such ar- 
chitectures, software programs executing on the proc- 
essor are segmented into atomic "threads* that execute 
on the processor. To ensure architectural integrity, each 
thread is normally guaranteed access to the entire reg- 
ister set defined by the processor architecture even if 
the thread only uses a fraction of that register set. 
[0006] A particular thread's instructions and data, to- 



gether with the architectural registers that store that da- 
ta, are referred to as a "context 0 . A "context switch" oc- 
curs when the architectural resources are switched from 
one thread to another thread. A context switch occurs, 

5 for example, when one thread become inactive or is ter- 
minated and the processor resources are applied to an- 
other active thread. A context switch also occurs, for ex- 
ample, when an executing thread accesses a resource 
that has a long latency, or when a thread with higher 

10 priority than the current thread is imposed on the proc- 
essor. When a context switch occurs, the data in the reg- 
isters is moved out of the registers and saved to persist- 
ent storage (or some other memory location). Data for 
the new context is then transferred into the registers. 

is [0007] One way to organize registers within a proces- 
sor is to use a register windowing technique to access 
a plurality of registers in a register file. With register win- 
dowing, a register window has a predetermined number 
of contiguous registers, and the window can be moved 

20 linearly within the register file. At any one time, the reg- 
ister window permits program access to a subset of the 
total number of registers in the register file. Control reg- 
isters are also associated with the register windows so 
that a program can manipulate the position of the win- 

25 dow within the register file and monitor the status of the 
window 

[0008] For example, in the specification for a scala- 
ble processor architecture, SPARC-N/g, the general pur- 
pose registers for storing and manipulating data are ar- 

30 ranged in register sets accessible through register win- 
dows, each register window having 32 registers. A par- 
ticular processor can have multiple register sets ranging 
from three register sets to 32 register sets. Individual 
registers are addressable using a five-bit address in 

35 conjunction with a current window pointer (CWP). The 
register window is movable within the register sets such 
that a program can logically address multiple physical 
registers in the register sets by simply tracking a logical 
register name or specifier and the current window point- 

40 er. 

[0009] In prior implementations, the entire register file 
is purged in response to a context switch, and the reg- 
ister file is initialized for a new process. If the new proc- 
ess is itself a saved process, the register values are re- 

45 stored from storage before the context takes effect. Be- 
cause of this, two memory operations, one to write the 
old context to storage and a second to read the new con- 
text from storage, may be required for each context 
switch. For VLIW architectures, this situation creates an 

so undesirable number of memory transactions that con- 
stitute overhead to the fundamental data processing 
performance of the processor. For example, in an archi- 
tecture providing 256 registers, up to 51 2 memory trans- 
actions may be required to implement a context switch. 

55 This setup may be required in prior systems even where 
only a few of the 256 regist rs were actually used by the 
current process and where only a few of the registers 
will be used by the new process. A need exists for a 
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processor architecture that provides low overhead ma- 
nipulation of a large register file. 
[001 0] Another limitation ot existing processors is that 
during a context switch, all of the processors resources 
are dedicated to completing the context switch. Other 
operations are blocked until the new context is in place. 
This type of operation decreases the efficiency of the 
processor because every operation is stalled until all of 
the old context's registers are saved (including registers 
that were not used) and all of the new context's registers 
are initialized or restored (including registers that will not 
be used). 

SUMMARY OF THE INVENTION 

[0011] Briefly stated, the present invention solves 
these and other limitations by saving only registers that 
have been modified in response to a context switch Fur- 
ther, during a context switch, the new context's registers 
are dynamically loaded from its context record when the 
register is used. In this manner, no overt tend po» »c*lty is 
incurred for registers that are archr.ee :ur<iliy spoo'icd 
but not used by the thread,. Also the contcxi sowing 
process is performed in accordance with the prccrt m 
vention in parallel with other operators «n the new con 
text, minimizing the impact of context switching cn proc- 
essor performance. 

[0012] In another aspect, the present invocrion in- 
volves a method lor operating a processor inr incline the 
steps of establishing a first register save nm* *nd a sec- 
ond register save area in a memory, where each register 
save area holds data values that define a context The 
first context is loaded in the processor by load rig at least 
some of the data values from the first register save area 
into the plurality of registers. A first pointer value to the 
first register save area is stored in a current RFSA reg- 
ister. A context switch is indicated by storing a second 
pointer to the second register save area m the current 
RFSA register. The first pointer is transferred from the 
current RFSA register to a previous RFSA register All 
of the data values that define the first context are trans- 
ferred from the registers to a shadow register tile The 
second context is established in the processor by load- 
ing selected data values from the second register tile 
save area into the plurality of registers 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0013] 

FIG. 1 shows in block diagram form a computer sys- 
tem implem^ntin^gijontext switching in accordance 
with the present invention; and 

FIG. 2 shows in block diagram form components of 
a context switching apparatus in accordance with 
the present invention. 



DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[0014] FIG. 1 illustrates in block diagram form a com- 

s puter system incorporating an apparatus and system in 
accordance with the present invention. Processor archi- 
tectures and computing systems are usefully represent- 
ed as a collection of interacting functional units as 
shown in FIG. 1. These functional units, discussed in 

io greater detail below, perform the functions of fetching 
instructions and data from memory, processing fetched 
instructions, managing, memory transactions, interfac- 
ing with external I/O arret displaying information. 
[0015] The present invention is described in terms of 

is an apparatus and method particularly useful in a com- 
puter system 100 such as shown in FIG. 1 . FIG. 1 shows 
a typical general purpose computer system 100 incor- 
porating a processor 102 and implementing both an ap- 
plication program and an operating system executing in 

20 processor 102. Computer system 100 in accordance 
with the present invention comprises a system bus 101 
for communicating information, processor 102 coupled 
with bus 101 through input/output (I/O) devices within 
processor 102. Processor 102 is coupled to memory 

25 system 107 using a memory bus 103 to store informa- 
tion and instructions for processor 102. Memory system 
1 07 comprises, for example, one or more levels of cache 
memory and main memory in memory unit 1 07. It should 
be understood that some cache memory is included on- 

30 chip with processor 102 in most applications in addition 
to cache and memory in memory system 107. 
[0016] User I/O devices 106 are coupled to bus 101 
and are operative to communicate information in appro- 
priately structured form to and from the other parts of 

35 computer 100. User I/O devices may include a key- 
board, mouse, magnetic or tape reader, optical disk, or 
other available I/O devices, including another computer. 
Mass storage device 117 is coupled to bus 101 and may 
be implemented using one or more magnetic hard disks, 

to magnetic tapes, CD ROMs, large banks of random ac- 
cess memory, or the like. A wide variety of random ac- 
cess and read-only memory technologies are available 
and are equivalent for purposes of the present invention. 
Mass storage 1 1 7 includes computer programs and data 

45 stored therein. Some or all of mass storage 1 1 7 may be 
configured to be incorporated as part of memory system 
107. Processor 102 includes a number of registers 203 
(shown in FIG. 2) that are typically implemented in hard- 
ware. Data processing occurs by loading data into reg- 

so jsters 203, modilying data in registers 203, and storing 
data from registers 203 back out to memory 1 07 
[0017] A processor 102 with a large general register 
file 202 (e.g.> 256 registers) carries with this the potential 
for 512 memory operations on a context switch. Many 

£5 of these will be unnecessary and so may lengthen the 
context switch time considerately. The present invention 
seeks to reduce this time by only loading and saving reg- 
isters that are in fact needed, and to perform the saving 
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operations in parallel with other instructions. 
[001 8] Referring to FIG . 2, an execution unit 201 with- 
in processor 1 02 access s (i.e., loads, stores, and mod- 
ifies) data stored in general register file 202. General 
register file 202 comprises a plurality of registers 203 5 
where the number of registers 203 is specified by the 
processor architecture. Each register 203 includes at 
least one, and typically more than one storage cell(s) 
each storing one bit of information. The registers 203 in 
general register file 202 are alternatively referred to as 10 
"architectural" register files. Each register 203 is acces- 
sible by execution unit 201 to enable processor 102 to 
load data from a selected register 203 and store data to 
a selected register 203. 

[001 9] The contents of registers 203 in aggregate de- is 
tine a "context" or the processor state during execution 
of a particular program thread or process. The context 
is saved persistently in memory 107 in a register save 
area 209a or 209b so that it can be restored on demand. 
Although only two register save areas 209a and 209b 20 
are shown, each context has a corresponding save area 
209. Hence, there may be hundreds or thousands of 
register save areas 209. 

[0020] As shown in FIG. 2, a current register file save 
area (RFSA) register 204 contains data pointing to an 25 
area of memory 107 where the current thread's register 
values are saved. The term "current thread" means a 
program thread currently executing in execution unit 
201 . In a particular example, RFSA 204 contains a point- 
er to a start location or first memory location of a se- 30 
quentially block of memory holding all of the values from 
registers 203. Where the number of registers 203 in reg- 
ister file 202 is specified by the processor architecture, 
only the first register address is needed to access all of 
the registers 203 individually. Alternatively, the register 35 
values can be accessed from memory 107 in groups 
comprising some but less than all of the registers. A new 
value is written into current RFSA register 204 at the 
initiation of a context switch. 

[0021] In accordance with the present invention, eacrTj 40 
of registers 203 include or is associated with a valid bit j 
(labeled "V" in FIG. 2) indicating whether data of the reg- ! 
ister is effective data or not When current RFSA register 
204 is written to, indicating a context switch, all the gen- 
eral registers 203 are marked as invalid. If a register 203 i 45 
is written to wh en in the invalid state the associated valid \ 
bit is then marked valid 

[0022] When a register 203 is read while in the invalid 
state, processor 1 02 automatically loads the value from 
the register file save area 209a of memory 107. When so 
the value is loaded, the corresponding valid bit is 
marked valid. Each register 203 is also associated with 
a dirty bit (labeled "D" in FIG. 2) that indicates that the 
value stor d in the register file save ar a of memory 1 07 
for that register 203 is out of sync with the current reg- ss 
ister 203. Registers having a dirty bit set are referred to 
herein as "dirty registers 0 . 

[0023] When current RFSA register 204 is written to 



(indicating a context switch) its current value is copied 
to previous RFSA regist r 206. Previous RFSA register 

206 is, for example, coupled to current RFSA register 
204 to receive the copied current RFSA value. Also, the 
value in all registers 203 are copied, preferably in one 
cycle, into shadow register file 207 over local register 
connection 205. In practice, shadow register file 207 
may be implemented in a fashion so that each memory 
cell or storage location in each register 203 is physically 
adjacent to a corresponding memory cell or storage lo- 
cation in shadow register file 207. In this manner, mem- 
ory hus 103 is not burdenedwrth the traffic required to 
copy all of registers 203 during a context switch. The 
content of each register 203 is copied together with its 
associated valid bit and dirty bit. Once all the registers 
203 have been copied to shadow register file 207, reg- 
ister file 202 can be dedicated to holding entries from 
the new context. 

[0024] As there are unused cycles on the memory bus 
103 registers values stored in shadow register 207 that 
are dirty as indicated by the dirty bit are written to the 
register save area 209b in memory 1 07 using memory 
bus 103 at a location described by or derived from the 
value held in previous RFSA 206. This "lazy" write back 
or commit of the register values in shadow register 207 
makes efficient use of memory bus 103 and the availa- 
ble memory bus cycles. This function may be imple- 
mented without added processor instructions by hard- 
ware coupled to current RFSA register 204 to detect a 
change in current RFSA 204 value and coupled to shad- 
ow register 207 to detect whether any dirty bits are set 
in shadow register 207. 

[0025] When a context switch occurs, two cases may 
exist. In a first case, if the value of the new context in 
current RFSA 204 is the same as the value in the pre- 
vious RFSA 206, the register values in shadow register 

207 and register file 202 can be swapped. In this case, 
it is not necessary to complete write back of all dirty reg- 
isters in shadow register 207 to register save area 209b. 
To enable swapping of the register values in may be de- 
sirable to include a third register file to temporarily hold 
the contents during swapping, although other hardware 
mechanisms to implement this swapping will be appar- 
ent to those skilled in the design of memory and proc- 
essor systems. 

[0026] In the second case, the value of the new con- 
text identified by the value stored in current RFSA 204 
is different from the value in previous RFSA register 206. 
In this case, all the registers having dirty bits currently 
set in shadow 207 (if any exist) must be saved to register 
save area 209b in memory 1 07 before the context switch 
takes place. In most applications, all of the dirty registers 
in shadow register 207 will have already been written 
back in free memory cycles. However, in applications 
where context switches occur without sufficient memory 
cycles to complete the write back, a forced write back is 
required. If the operating environment is such that fre- 
quent context switches occur with insufficient free mem- 
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ory clock cycles to complete the write back of dirty reg- 
isters in shadow 207, it may be desirable to steal mem- 
ory cycles from the memory bus. The stolen memory 
cycles will impact normal operation, but will allow dirty 
registers in shadow register 207 to be completely written 
back before a subsequent context switch, resulting in a 
net improvement. 

[0027] In most environments the write back of data 
from shadow register 207 to memory 107 will only write 
into a data cache. In uniprocessor environments it is ac- 
ceptable for the contents of RFSA 209b to be held in 
cache because the processor can readily access 
cached data. However, in SMP environments, all con- 
tents of shadow register 207 having the dirty bits set 
must eventually be written to memory 107 so that the 
data is available to other processors in the SMP system. 
To ensure that all the registers 203 are eventually written 
out to memory 1 07, it may be necessary to force a cache 
write-back operation by issuing a commit instruction 
from time to time that writes all register values stored in 
cache to main memory. This commit instruction may be 
issued periodically or based upon elapsed time that the 
data has been in shadow register 207, or frequency of 
context switches, other equivalent criteria, or a combi- 
nation of these criteria. 

[0028] In a preferred implementation, the new register 
values are loaded over memory bus 1 03 as needed from 
the associated register save area 209a in memory 107 
to register file 202 using the new value stored in current 
RFSA register 204. This is alternatively referred to as a 
"lazy" restore, a "dynamic" restore, or a "just-in-time" re- 
store. This is in contrast to existing systems that load 
the values into all ot registers 203 before beginning to 
process data stored in register file 202. As described 
hereinbefore, upon a context switch, the valid bits of 
each register 203 are marked invalid. As execution unit 
201 requests data for the first time from a register 203, 
the value is automatically loaded from memory 107 (be- 
cause the register 203 is marked invalid). Some delay 
can be added to the instruction execution until the re- 
stored data is ready. Until the data from a particular reg- 
ister is requested, it remains in main memory (or cache). 
This dynamic restore feature of the present invention 
saves memory bus cycles that are ordinarily used to 
transfer register values that are not used by the execut- 
ing thread. 

[0029] Because register entries are dynamically load- 
ed on use, the registers that are loaded are available for 
use very quickly, without the latency normally associat- 
d with restoring the entire register file 202. Hence, the 
dynamic loading feature enables execution unit 201 to 
process data from registers 203 faster than prior proc- 
essors. By avoiding the latency required by restoring the 
entire register file 202, the processor 102 in accordance 
with the present invention operates with greater efficien- 
cy. 

[0030] The method in accordance with the present in- 
vention carries an added advantage in a symmetric mul- 



tiprocessing (SMP) environment. In an SMP environ- 
ment, a thread from a single threaded program may ex- 
ecute on different processors, each having distinct reg- 
ister file 202. In these cases a CPU may attempt to ex- 

s ecute a task that has not finished having its registers 
written back to memory on another CPU. 
[0031] In the prior art, this problem could be handled 
in three ways: 1 ) save the registers 203 at every context 
switch and just perform the lazy restore; 2) bind the 

10 switched out thread to the processor upon which it was 
executed and which holds its shadow register 207; and 
3) interrupt the processor having the shadow register 
207 and force that processor to write the shadow regis- 
ter 207 to it associated register save area 209b in mem- 

i& ory. All of these have disadvantages, but in accordance 
with a preferred embodiment of the present invention, 
shadow register file 207 will automatically synchronize 
with its associates register save area 209b given suffi- 
cient time (i.e., sufficient available memory cycles). 

20 Each register file save area desirably includes a flag 208 
comprising a single bit to show that synchronization is 
complete (i.e., all registers in shadow register file 207 
having a dirty bit have been written back to memory 1 07. 
This bit flag 208 can be tested by a processor 102 that 

25 request to load the context stored in shadow register 
207. 

[0032] Although the invention has been described 
and illustrated with a certain degree of particularity, it is 
understood that the present disclosure has been made 
30 only by way of example, and that numerous changes in 
the combination and arrangement of parts can be re- 
sorted to by those skilled in the art without departing 
from the spirit and scope of the invention, as hereinafter 
claimed. 

35 

Claims 

1. A computer system including a processor and a 
40 memory, the computer system comprising: 

a general register fil e including a plurality of 

general registers holding data; 

an execution unit within the processor operable 

45 to access data from selected general registers; 

a first register save area located at an address 
in the memory, the first register save area stor- 
ing data held in the plurality of registers; 
a current register file save area register holding 

so a value pointing to the location of the first reg- 

ister save area in memory; 
a shadow register file coupled to receive data 
from the general register file and to send data 
to the memory, the shadow register file having 
an entry for each of the general registers; 
a second register save area located at an ad- 
dress in the memory, the second register save 
area storing data held in the shadow register 
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file; and 

a previous register file save area register cou- 
pled to the current register file save area regis- 
ter and holding a value pointing to the location 
of the second register save area in memory. 

2. The computer system of claim 1 wherein the gen- 
eral register file holds data for one of a plurality of 
contexts. 

3. The computer system of claim 1 wherein each gen- 
eral register is associated with a valid bit indicating 
whether data of the associated general register is 
effective data. 

4. The computer system of claim 1 wherein each gen- 
eral register is associated with a dirty bit indicating 
whether data of the associated general register is '•' 
synchronized with data corresponding to the asso- 
ciated general register in the royisicr save area 

5. The computer system of claim 4 further comprising 

a flag associated with o*ch general register 
save area, wherein the flag mdic.res wt-cn .ill val- 
ues in the register save area ;irc oynchron«/cd 

6. The computer system of claim i further corrpnsmg 
a memory bus coupling the plurality o- general reg- 
isters to the memory. 

7. The computer system of claim 6 further comprising 
a register bus separate from the memory bus cou- 
pling the general register file to the shadow register 
file. 

8. The computer system of claim 1 further comprising 
a register bus coupling the general register file to 
the shadow register file wherein each general reg- 
ister comprises a plurality of one-bit storage loca- 
tions and the register bus comprises a connection 
from each storage location in the general register 
file to a corresponding storage location in the shad- 
ow register file to enable the data held in all of the 
plurality ol registers to be transferred to the shadow 
register file in a single clock cycle. 

9. The computer system of claim 1 further comprising: 

a memory bus coupling the plurality of general 
registers to the memory, wherein the shadow reg- 
ister file sends data to the memory using the mem- 
ory bus while the memory bus is not being used to 
communicate data between the plurality of registers 
and the memory. 

1 0. The computer system of claim 1 further comprising: 

a control connection between the shadow 
register file, the current RFSA register and the pre- 
vious RFSA register, wherein the shadow register 



file is further coupled to send data to the plurality of 
registers in response to detecting that the current 
RFSA value equals the previous RFSA value. 

5 11. A method for operating a processor having a plural- 
ity of registers holding data and a memory, the 
method comprising the steps of: 

providing a first register save area in the mem- 
io ory, the first register save area holding data val- 

ues that define a first context; 
providing a second register save area in the 
memory, the second register save area holding 
data values that define a second context; 
*5 providing the first context in the processor by 

loading at least some of the data values from 
the first register save area into the plurality of 
registers; 

storing a first pointer value which points to the 
20 first register save area in a current RFSA reg- 

ister; 

indicating a context switch by storing a second 
pointer to the second register save area in the 
current RFSA register; 

25 transferring the first pointer from the current 

RFSA register to a previous RFSA register; 
transferring substantially all of the data values 
that define the first context from the registers to 
a shadow register file; and 

30 providing the second context in the processor 

by loading selected data values from the sec- 
ond register file save area into the plurality of 
registers. 

35 12. The method of claim 11 further comprising the step 
of: 

providing a plurality of register file save areas 
in the memory wherein each register file save area 
holds data values that define a unique context, and 

40 

13. The method of claim 12 wherein the first register file 
save area and the second register file save area are 
selected from the plurality ol register file save areas 
by a thread executing on the processor. 

45 

14. The method of claim 11 wherein the step of trans- 
ferring substantially all of the data values further 
comprises the step of: 

copying each stored bit in the data values that 
50 define the first context to a corresponding bit stor- 
age location in the shadow register in a single mem- 
ory cycle. 

15. The method of claim 11 further comprising the step 
ss of: 

marking each of the plurality of registers with 
a valid bit indicating whether the data of the register 
is effective data. 
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16. The method of claim 11 further comprising the step 
of: 

marking each of the plurality of registers with 
a dirty bit indicating whether the data of the register 
is synchronized with data in its corresponding reg- s 
ister file save area. 

17. The method of claim 11 further comprising the step 
of: 

marking each register save area with a flag 10 
indicating whether substantially all values in the 
register file save area are synchronized. 

18. The melhod of claim 11 wherein the step of provid- 
ing the second context further comprises loading is 
the data values from the second register file save 
area on a register by register basis as needed by a 
thread executing on the processor. 

19. The melhod of claim 11 wherein the step of provid- 20 
ing the second context further comprises the steps 

of: 

marking each of the plurality of registers with a 
valid bit indicating whether the data of the reg- 25 
ister is effective data: 

setting each of the valid bits to indicate that the 
data is not effective data; 
each time a register is accessed, retrieving data 
from the accessed register if the valid bit indi- 30 
cates that the data is effective: and 
each time a register is accessed, retrieving data 
from the register file save area if the valid bit 
indicates that the data is not effective, filling the 
register with the retrieved data, and setting the 35 
valid bit for the register to indicate that the reg- 
ister's data is now effective. 

20. A computer program product for use with a general 
purpose computer, said computer program product 40 
comprising: 

a computer usable medium having computer 
readable program code means embodied in 
said medium for operating a processor having *s 
a plurality of registers holding data and a mem- 
ory, the computer program product having: 
computer readable program code devices for 
causing a computer to provide a first register 
save area in the memory, the first register save so 
area holding data values that define a first con- 
text; 

computer readable program code devices for 
causing a computer to provide a second regis- 
ter save area in the memory, the second regis- ss 
ter save area holding data values that define a 
second context; 

computer readable program code devices for 



causing a computer to provide the first context 
in the processor by loading at least some of the 
data values from the first register save area into 
the plurality of registers; 
computer readable program code devices for 
causing a computer to store a first pointer value 
to the first register save area in a current RFSA 
register; 

computer readable program code devices for 
causing a computer to indicate a context switch 
by storing a second pointer to the second reg- 
ister save area in the current RFSA register, 
computer readable program code devices for 
causing a computer to transfer the first pointer 
from the current RFSA register to a previous 
RFSA register; 

computer readable program code devices for 
causing a computer to transfer all of the data 
values that define the first context from the reg- 
isters to a shadow register file; and 
computer readable program code devices for 
causing a computer to provide the second con- 
text in the processor by loading selected data 
values from the second register file save area 
into the plurality of registers. 

21. A register architecture for a data processor com- 
prising: 

a memory having a plurality of addressable 
storage locations; 

a general register file including a plurality of 
general registers; and 

a plurality of register save areas defined in the 
memory, each register save file holding data 
defining a unique context and each register ad- 
dressable to provide data to the general regis- 
ter file on a register-by-register basis. 

22. The register architecture of claim 21 further com- 
prising: 

a dirty bit associated with each general register 
indicating whether data in the register is syn- 
chronized with data in the register save area; 
and 

a register write back device data to write back 
only registers marked as dirty to the register 
save file during a context switch. 

23. A method for operating a processor having a plural- 
ity of registers holding data defining a current con- 
text, wherein the processor is associated with a 
memory, the method comprising the steps of: 

providing a plurality of register save files locat- 
ed in the memory, each register save file hold- 
ing data defining a particular context; 
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indicating a new context by pointing to a loca- 
tion of a selected register save fife; and 
loading the new context on a register-by-regis- 
ter basis as each register is required by a thread 
operating on the processor, wherein at least s 
one but less than all of the registers are loaded. 

24. A computer data signal embodied in a carrier wave 
comprising: 

10 

a first code portion comprising code configured 
to cause a computer to define a plurality of reg- 
ister save files located in the computer memory, 

each register save file holding data defining a . 
particular context; is 
a second code portion comprising code config- 
ured to cause a computer to indicate a new con- 
text by pointing to a location of a selected reg- 
ister save file; and 

a third code portion comprising code configured 20 
to cause a computer to load the new context on 
a register-by-register basis as each register is 
required by a thread operating on the computer 
such that at least one but less than all of the 
registers are loaded. 2s 
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